2,140 research outputs found

    A new approach for sizing trials with composite binary endpoints using anticipated marginal values and accounting for the correlation between components

    Get PDF
    Composite binary endpoints are increasingly used as primary endpoints in clinical trials. When designing a trial, it is crucial to determine the appropriate sample size for testing the statistical differences between treatment groups for the primary endpoint. As shown in this work, when using a composite binary endpoint to size a trial, one needs to specify the event rates and the effect sizes of the composite components as well as the correlation between them. In practice, the marginal parameters of the components can be obtained from previous studies or pilot trials, however, the correlation is often not previously reported and thus usually unknown. We first show that the sample size for composite binary endpoints is strongly dependent on the correlation and, second, that slight deviations in the prior information on the marginal parameters may result in underpowered trials for achieving the study objectives at a pre-specified significance level. We propose a general strategy for calculating the required sample size when the correlation is not specified, and accounting for uncertainty in the marginal parameter values. We present the web platform CompARE to characterize composite endpoints and to calculate the sample size just as we propose in this paper. We evaluate the performance of the proposal with a simulation study, and illustrate it by means of a real case study using CompARE

    Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts

    Get PDF
    Recently, machine translation systems based on neural networks have reached state-of-the-art results for some pairs of languages (e.g., German–English). In this paper, we are investigating the performance of neural machine translation in Chinese–Spanish, which is a challenging language pair. Given that the meaning of a Chinese word can be related to its graphical representation, this work aims to enhance neural machine translation by using as input a combination of: words or characters and their corresponding bitmap fonts. The fact of performing the interpretation of every word or character as a bitmap font generates more informed vectorial representations. Best results are obtained when using words plus their bitmap fonts obtaining an improvement (over a competitive neural MT baseline system) of almost six BLEU, five METEOR points and ranked coherently better in the human evaluation.Peer ReviewedPostprint (published version

    Does cueing training improve physical activity in patients with Parkinson's disease?

    Get PDF
    Patients with Parkinson’s disease (PD) are encouraged to stay active to maintain their mobility. Ambulatory activity monitoring (AM) provides an objective way to determine type and amount of gait-related daily activities. Objective To investigate the effects of a home cueing training program on functional walking activity in PD. Methods In a single-blind, randomized crossover trial, PD patients allocated to early intervention received cueing training for 3 weeks, whereas the late intervention group received training in the following 3 weeks. Training was applied at home, using a prototype cueing device. AM was applied at baseline, 3, 6, and 12 weeks in the patient’s home, to record body movements. Postures and motions were classified as percentage of total time spent on (a) static activity, further specified as % sitting and % standing, and (b) % dynamic activity, further specified as % walking, % walking periods exceeding 5 seconds (W>5s) and 10 seconds (W>10s). Random coefficient analysis was applied. Results A total of 153 patients participated in this trial. Significant improvements were found for dynamic activity ( = 4.46; P 5s ( = 2.63; P 10s ( = 2.90; P < .01). All intervention effects declined significantly at 6 weeks follow-up. Conclusion Cueing training in PD patients’ own home significantly improves the amount of walking as recorded by AM. Treatment effects reduced after the intervention period, pointing to the need for permanent cueing devices and follow-up cueing training

    Test-retest reliability of the Shape/Texture Identification testTM in people with chronic stroke

    Get PDF
    To evaluate the test-retest reliability of the Shape/Texture Identification test (STI-test(TM)) in persons with chronic stroke

    Knee surgery and its evidence base

    Get PDF
    Introduction Evidence driven orthopaedics is gaining prominence. It enables better management decisions and therefore better patient care. The aim of our study was to review a selection of the leading publications pertaining to knee surgery to assess changes in levels of evidence over a decade. Methods Articles from the years 2000 and 2010 in The Knee, the Journal of Arthroplasty, Knee Surgery, Sports Traumatology, Arthroscopy, the Journal of Bone and Joint Surgery (American Volume) and the Bone and Joint Journal were analysed and ranked according to guidelines from the Centre for Evidence-Based Medicine. The intervening years (2003, 2005 and 2007) were also analysed to further define the trend. Results The percentage of high level evidence (level I and II) studies increased albeit without reaching statistical significance. Following a significant downward trend, the latter part of the decade saw a major rise in levels of published evidence. The most frequent type of study was therapeutic. Conclusions Although the rise in levels of evidence across the decade was not statistically significant, there was a significant drop and then rise in these levels in the interim. It is therefore important that a further study is performed to assess longer-term trends. Recent developments have made clear that high quality evidence will be having an ever increasing influence on future orthopaedic practice. We suggest that journals implement compulsory declaration of a published study's level of evidence and that authors consider their study designs carefully to enhance the quality of available evidence

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    WiSeBE: Window-based Sentence Boundary Evaluation

    Full text link
    Sentence Boundary Detection (SBD) has been a major research topic since Automatic Speech Recognition transcripts have been used for further Natural Language Processing tasks like Part of Speech Tagging, Question Answering or Automatic Summarization. But what about evaluation? Do standard evaluation metrics like precision, recall, F-score or classification error; and more important, evaluating an automatic system against a unique reference is enough to conclude how well a SBD system is performing given the final application of the transcript? In this paper we propose Window-based Sentence Boundary Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary Detection systems based on multi-reference (dis)agreement. We evaluate and compare the performance of different SBD systems over a set of Youtube transcripts using WiSeBE and standard metrics. This double evaluation gives an understanding of how WiSeBE is a more reliable metric for the SBD task.Comment: In proceedings of the 17th Mexican International Conference on Artificial Intelligence (MICAI), 201

    Intestinal Parasites Classification Using Deep Belief Networks

    Full text link
    Currently, approximately 44 billion people are infected by intestinal parasites worldwide. Diseases caused by such infections constitute a public health problem in most tropical countries, leading to physical and mental disorders, and even death to children and immunodeficient individuals. Although subjected to high error rates, human visual inspection is still in charge of the vast majority of clinical diagnoses. In the past years, some works addressed intelligent computer-aided intestinal parasites classification, but they usually suffer from misclassification due to similarities between parasites and fecal impurities. In this paper, we introduce Deep Belief Networks to the context of automatic intestinal parasites classification. Experiments conducted over three datasets composed of eggs, larvae, and protozoa provided promising results, even considering unbalanced classes and also fecal impurities

    Validity and reliability of Resource Utilization Groups (RUG-III) in Finnish long-term care facilities

    Full text link
    Resource Utilization Groups, Version III (RUG-III) is a case-mix system developed in the USA for classification of long-term care residents. This paper examines the validity and reliability of an adapted 22-group version of RUG-III (RUG-III/22) for use in long-term care facilities in Finland. Finnish cost weights for RUG-III/22 groups are calculated and different methods for their computation are evaluated. The study sample (1,964 residents) was collected in 1995 - 96 from ten long-term care facilities in Finland. RUG-III/22 alone explained 38.2% of the variance of total patient-specific (nursing + auxiliary staff) per diem cost. Resource use within RUG groups was relatively homogeneous. Other predictors of resource use included age, gender and length of stay. RUG-III/22 also met the standard for good reliability (i.e. a kappa value of 0.6 or higher) for crucial classification items, such as activities of daily living and high correlation between assessments based on relative cost.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/68924/2/10.1177_14034948990270030201.pd
    • …
    corecore